Introduction

Row

Welcome to my portfolio,

I’m Reginald van Putt and this is my portfolio for the course Computational Musicology (UvA). In this portfolio I will investigate a corpus that I have created with help of my peers and friends. The corpus exists off the top 5 songs of people from all kinds of bachelors. With this corpus I’m trying to investigate whether or not there is a significant difference in study directions in terms of music taste. Specifically looking at the valence, energy and loudness.

Why?

The reason why I wanted to investigate this is because I quite often ask people about their music taste and I have done a few courses from different faculties and it seemed to me that different faculties (i.e. FNWI vs AI) have quite different tastes in music. So I that’s why I wanted to analyse more data and see if there is a significant difference. I initially wanted to also include genre in the analysis. But this turned out to be a lot more complicated then I thought initially, hence i dropped this part of the analysis.

The graph underneath this shows all the data I have incorporated in my corpus. On the x-axis you can see the valence and on the y-axis the energy of the songs, the size of the dots are correlated to the loudness of the songs. The songs plotted are the top 5 songs of many peers and friends, the colors are divided based on the faculty and the graphs are also divided based on faculty/direction of the bachelors of the participants.

Main corpus visualization

Row

Faculty of Humanities

Faculty of Science

Social and Behavior Studies

Technical University Studies

Explanation of the graphs

Unfortunatly do to some glitches I was not able to place the text next to the graphs, so here are all the details and explanations of the graphs:

Faculty of Humanities:

This graphs shows the valence on the x-axis, energy on the y-axis, the color is correlated to the bachelor type and the size of the dots are correlated to the loudness. A thing to note is that this graph has a lot less bachelors then the other faculty graphs. However each bachelor has more then 1 data set/participant, meaning that this is more then just 2 data sets. This data set is quite similar to most data sets that have this format. You often see a lot of songs in the upper half and left side of the graph. And almost nothing in the bottom right. There is 1 song that does present itself in this corner of the graph, but I kept it inside the data because I do not think I should remove outliers like this in the representations or calculations.

Faculty of Science:

This graphs shows the valence on the x-axis, energy on the y-axis, the color is correlated to the bachelor type and the size of the dots are correlated to the loudness. This data set is quite similar to most data sets that have this format. You often see a lot of songs in the upper half and left side of the graph. And almost nothing in the bottom right, however I do see a low amount of songs in the bottom left corner as well, which might mean that the average energy of this faculty lies higher then average datasets due to the lack of songs in the bottom half.

Social and Behavior Studies:

This graphs shows the valence on the x-axis, energy on the y-axis, the color is correlated to the bachelor type and the size of the dots are correlated to the loudness. This data set is quite similar to most data sets that have this format. You often see a lot of songs in the upper half and left side of the graph. And almost nothing in the bottom right. This is quite similair to this data set and therefor the data seems quite average. It looks like this data set has a higher average loudness then other faculties. Which might indicate that the energy is also slightly higher, because loudness has a (partial) correlation to energy.

Technical University Studies:

This graphs shows the valence on the x-axis, energy on the y-axis, the color is correlated to the bachelor type and the size of the dots are correlated to the loudness. This data set is quite similar to most data sets that have this format. You often see a lot of songs in the upper half and left side of the graph. And almost nothing in the bottom right. This graph however shows little datapoints in the bottom left as well, suggesting a higher average energy level then average datasets and some of the other data sets in this corpus.

Visualizations

Row

Chromagram


The reason that the left song has lower energy and valence might be due to the lower frequency of notes, it looks like the amount of notes is quite a bit lower then in the right song / average song. Secondly the valence might be lower because of the fact that the outlier song is mostly minor while the average song is mostly major (according to spotify). Which you might be able to see looking at the chords played throughout the song.

Key analysis

Low

High

Random


The keygrams are quit interesting to compare. The lowest valence song shows a vertical band between 150 and 200 seconds. Meaning that there are more keys being used in that time period compared to the rest. Throughout the song the keys that are used the majority of the time are kind of the same. but the quantity/certainty of it differs a bit. For the highest valence song it’s quite different. there are not a lot of keys that are used but the keys that are used are really stable though the whole song. For the random song there is a lot of uniformity throught the song, but some slight differences such as in the intro and 2 (probably) bridges of sort.

Chord analysis

Low

High

Random


There seem to be a very big overlap between chords used in all three songs. Almost (if not all) chords seem to be the same (if looking at the blue-est parts of the chordgram). This surprised me since the valence of these songs is quite different and not all songs are of the same genre (so not just generic pop songs).

Tempo analysis

Row

Novelty graph of loudness

Low valence

High valence


Here you see two songs of the corpus, more specifically two songs out of the Faculty of Science category. The first graph is a novelty graph of loudness of the song with the lowest valence and the second graph is a novelty graph of loudness of the song with the highest valence (both according to spotify’s api). A thing to notice is that the high valence songs is quite a bit more averaged novelty of loudness (quite even over the song) while the low valence song has a lot of really high peaks. The large difference in peaks and lows of the low valence song is because of the fact that this song has a lot of loudness differences to highlight the emotions in the song, which are mostly created by the large portions of no vocals and the sudden piano volume increases. I think that this also explains the low valence. The high valence song is a rap song, the song starts with some adlibs and music and bird sounds, which are extra loud compared to instrumentals in the rest of the song. This explains the peak in loudness at the start. On average, rap songs have pretty even loudness across the songs so that explains the rest of the graph.

Something to note is that both graphs only show 180 seconds of the song, this is due to calculation times and because it would get to messy if the full song got showed.

The 2 songs used are: Rosebud||Man-Made-Sunshine(low valence) and Nas is like||Nas (high valence)

Tempogram

Low valence

High valence


The tempograms shown above both have the time of the song on the x-axis and the tempo in beats per minute (BPM) in the y-axis. The color shows in a gradient the likeliness of matching of different tempo’s (yellow being a higher chance then blue). The first graph shows the same low valence song as in the novelty graph of loudness. This song has a pretty clear line around 85 BPM but also a lot of clutter all over the graph. This is most likely because of the lack of percussion and non consistent singing, making it hard for the algorithm to figure out the tempo.

The high valence song (also the same as in the novelty graph of loudness) has a pretty clear line around the 95 BPM, and almost no clutter compared to the low valence song. Which is probably because of the same general beat used throughout the whole song. The lines are slightly more blurry at the start and end because the intro and outro are a bit different.

Dendrogram

Row

Dendrogram of the Faculty of Science dataset

NULL

Explanation of the graph

When looking at this dendrogram I immediately notice that a lot of clusters consist of 5 songs. Looking at the fact that everyone that provided data gave me 5 songs, this would make a lot of sense. Mainly because a lot of people have matching genres or even performers in their top 5 songs. I originally wanted to include the other faculties as well and divide them in colors, this however was a challenge that I could not overcome, I have spend approximately 10 hours on it and nothing seemed to work. So I settled for this representation.

Statistical analysis

Row

Post hoc results energy

$statistics
     MSerror  Df      Mean       CV
  0.03743911 126 0.6225362 31.08123

$parameters
   test  name.t ntr StudentizedRange alpha
  Tukey faculty   4         3.682115  0.05

$means
                                energy       std  r         se    Min   Max
Faculty of Humanities        0.5392567 0.2374104 30 0.03532663 0.0827 0.883
Faculty of Science           0.6809333 0.1782060 45 0.02884407 0.1360 0.932
Social and Behavior studies  0.6186333 0.2020701 30 0.03532663 0.2350 0.921
Technical University studies 0.6220400 0.1445278 25 0.03869838 0.3110 0.885
                                Q25    Q50    Q75
Faculty of Humanities        0.3345 0.5525 0.7575
Faculty of Science           0.5610 0.7080 0.8440
Social and Behavior studies  0.5140 0.6305 0.7900
Technical University studies 0.5160 0.6100 0.7040

$comparison
NULL

$groups
                                energy groups
Faculty of Science           0.6809333      a
Technical University studies 0.6220400     ab
Social and Behavior studies  0.6186333     ab
Faculty of Humanities        0.5392567      b

attr(,"class")
[1] "group"

These are the results of the post hoc analysis, specifically the tukey test. The most important part is the comparison value, this returns NULL, meaning that there is no significant difference between any faculty in terms of energy

Post hoc results valence

$statistics
     MSerror  Df      Mean      CV
  0.05274719 126 0.4876946 47.0925

$parameters
   test  name.t ntr StudentizedRange alpha
  Tukey faculty   4         3.682115  0.05

$means
                               valence       std  r         se    Min   Max
Faculty of Humanities        0.3325633 0.2139572 30 0.04193137 0.0376 0.915
Faculty of Science           0.5602222 0.2265838 45 0.03423682 0.1590 0.929
Social and Behavior studies  0.4976467 0.2548524 30 0.04193137 0.0994 0.875
Technical University studies 0.5313600 0.2213671 25 0.04593351 0.1280 0.912
                               Q25    Q50     Q75
Faculty of Humanities        0.187 0.2740 0.41900
Faculty of Science           0.350 0.5800 0.74100
Social and Behavior studies  0.275 0.4825 0.74075
Technical University studies 0.299 0.5660 0.68600

$comparison
NULL

$groups
                               valence groups
Faculty of Science           0.5602222      a
Technical University studies 0.5313600      a
Social and Behavior studies  0.4976467      a
Faculty of Humanities        0.3325633      b

attr(,"class")
[1] "group"

These are the results of the post hoc analysis, specifically the tukey test. The most important part is the comparison value, this returns NULL, meaning that there is no significant difference between any faculty in terms of valence.

Post hoc results loudness

$statistics
  MSerror  Df      Mean        CV
  11.9608 126 -7.425885 -46.57275

$parameters
   test  name.t ntr StudentizedRange alpha
  Tukey faculty   4         3.682115  0.05

$means
                              loudness      std  r        se     Min    Max
Faculty of Humanities        -9.366167 4.610664 30 0.6314217 -22.895 -3.497
Faculty of Science           -6.467311 3.298074 45 0.5155536 -19.203 -0.210
Social and Behavior studies  -6.976967 3.214510 30 0.6314217 -15.892 -3.046
Technical University studies -7.361680 2.163257 25 0.6916878 -14.469 -3.997
                                  Q25    Q50      Q75
Faculty of Humanities        -11.2040 -9.026 -5.82375
Faculty of Science            -7.8070 -6.013 -4.67200
Social and Behavior studies   -9.0205 -5.862 -4.81225
Technical University studies  -8.4000 -7.066 -5.65500

$comparison
NULL

$groups
                              loudness groups
Faculty of Science           -6.467311      a
Social and Behavior studies  -6.976967      a
Technical University studies -7.361680     ab
Faculty of Humanities        -9.366167      b

attr(,"class")
[1] "group"

These are the results of the post hoc analysis, specifically the tukey test. The most important part is the comparison value, this returns NULL, meaning that there is no significant difference between any faculty in terms of loudness.

Discussion/Conclusion

I did a few statistical analyses on the data of my corpus. I specifically looked at energy, valence and loudness. Neither of these variables seemed to have any significant difference between any group as shown in the results of the post hoc analysis (the tukey test). Which means that although the difference are cleary there between groups and songs, there are not any significant differences on average. This is quite interesting because I really thought there would be a significant difference before I started to gather data, however the second I made my first graph I noticed that the differences are not that big on average. This might be because of the small sample sizes and because on average certain genres are more popular then others, which means that there is a big chance that people are fan of the same genre, thus making it more likely to have similar songs in terms of energy, valence and loudness.

I did however learn that tempo analysis and novelty graphs can give a lot of information and insights on a song, and make a lot of sense if you match them to a certain genre (like I mentioned in the corresponding tabs). The chromagram, key and chord analyses made a lot of sense in combination with the type of song (minor or major mainly) which justifies the spotify api which is nice to know that it does things well.

Row